2 research outputs found

    Adjunction in hierarchical phrase-based translation

    Get PDF

    Adjunction in hierarchical phrase-based translation

    Get PDF
    Hierarchical Phrase-Based SMT models are compositional by formal reliance on Synchronous Context-Free Grammar (SCFG). There is however no guarantee that target-side translations are compositional or even grammatical. Linguistic enrichment methods often exploit syntactic cues for better target rewritings or source-side rule selection, but these cues generally describe one side of the data, and have in turn little bearing on compositional translation equivalence. This dissertation takes adjunction as a source of linguistic information for translation modelling. We consider here the constituents involved in adjunction--adjuncts, in the broad sense of syntactic modifiers--and investigate which part they play in compositional, phrase-based translation. This dissertation shows that adjunction, which has been applied before in Syntax-Based SMT through Synchronous Tree-Adjunction Grammar, is also relevant in an asyntatic, SCFG-based paradigm. We first show through a corpus study that adjunction is largely synchronous in English-French, and that synchronous adjunction reflects translation compositionality rather than syntactic similarity only. In translation modelling for reordering-intensive language pairs like English-Chinese and English-Japanese, we find that driving recursion through soft adjunct-based constraints not only extends the reordering capacity of hierarchical phrase-based models, but also effectively filters short-range, asyntactic rules, while adjunct optionality can be exploited to further enrich translation grammars. Experiments with preordering confirm both the utility of adjunction to guide long-range reorderings and the local nature of adjunction decisions. Comparing adjuncts and constituents shows that constituents are more informative overall for translation modelling, while adjuncts behave more homogeneously and appear to form the largest part of synchronously applicable constituents
    corecore